Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

BiRe-ID: Binary Neural Network for Eﬃcient Person Re-ID

153

Then, we use eﬃcient XNOR and Bit-count operations to replace real-valued operations.

Following [199], the forward process of the BNN is

ai = b^aⁱ⁻¹⊙b^wⁱ,

(6.7)

where ⊙represents eﬃcient XNOR and Bit-count operations. Based on XNOR-Net, we

introduce a learnable channel-wise scale factor to modulate the amplitude of real-valued

convolution. Aligned with the Batch Normalization (BN) and activation layers, the 1-bit

convolution is formulated as

b^aⁱ= sign(Φ(αi ◦b^aⁱ⁻¹⊙b^wⁱ)).

(6.8)

In KR-GAL, the original output feature ai is ﬁrst scaled by a channel-wise scale factor

(vector) αi ∈RCi to modulate the amplitude of the real-valued counterparts. It then enters

Φ(·), which represents a composite function built by stacking several layers, e.g., BN layer,

non-linear activation layer, and max pool layer. The output is then binarized to obtain the

binary activations b^aⁱ∈Bni, using the sign function. sign(·) denotes the sign function that

returns +1 if the input is greater than zeros and −1 otherwise. Then, the 1-bit activation

b^aⁱcan be used for the eﬃcient XNOR and Bit-count of (i + 1)-th layer.

However, the gap in representational capability between wi and b^wⁱcould lead to a

large quantization error. We aim to minimize this performance gap to reduce the quan-

tization error while increasing the binarized kernels’ ability to provide information gains.

Therefore, αi is also used to reconstruct b^wⁱinto wi. This learnable scale factor can lead to

a novel learning process with more precise estimation of convolutional ﬁlters by minimizing

a novel adversarial loss. Discriminators D(·) with weights WD are introduced to distinguish

unbinarized kernels wi from reconstructed ones αi ◦b^wⁱ. Therefore, αi and WD are learned

by solving the following optimization problem.

arg

min

wi,b^wⁱ,αi ^max

WD ^L^K

Adv⁽^wⁱ^,^b^wⁱ^{, α}ⁱ^{, W}^D^{) +}^L^K

MSE⁽^wⁱ^,^b^wⁱ^{, α}ⁱ⁾^∀ⁱ^∈^N,

(6.9)

where L ^K

Adv⁽^wⁱ^,^b^wⁱ^{, α}ⁱ^{, W}^D^{) is the adversarial loss as}

L ^K

Adv⁽^wⁱ^,^b^wⁱ^{, α}ⁱ^{, W}^D^{) =}^log⁽^D⁽^wⁱ^;^W^D^{)) +}^log⁽¹⁻^D⁽^b^wⁱ^◦^αⁱ^;^W^D⁾⁾^,

(6.10)

where D(·) consists of several basic blocks, each with a fully connected layer and a

LeakyReLU layer. In addition, we employ discriminators to reﬁne every binarized con-

volution layer during the binarization training process.

Furthermore, LMSE(wi, b^wⁱ, αi) is the kernel loss between the learned real-valued ﬁlters

wi and the binarized ﬁlters b^wⁱ, which is expressed by MSE as

L ^K

MSE⁽^wⁱ^,^b^wⁱ^{, α}ⁱ^{) =}^λ

2 ^||^wⁱ⁻^αⁱ^◦^b^wⁱ^||²

2^,

(6.11)

where MSE is used to balance the gap between real value wi and binarized b^wⁱ. λ is a

balance hyperparameter.

6.2.3

Feature Reﬁning Generative Adversarial Learning (FR-GAL)

We introduce generative adversarial learning (GAL) to reﬁne the low-level characteristic

through self-supervision. We employ the high-level feature with abundant semantic infor-

mation aH ∈RmH to supervise the low-level feature aL ∈RmL, where mH = CH ·WH ·HH

and mL = CL · WL · HL. To keep the channel dimension identical, we ﬁrst employ a 1 × 1

convolution to reduce CH to CL as

a^∗

H ⁼^f⁽^W¹^×¹ ^⊗^a^H⁾^,

(6.12)